11 research outputs found

    SCA-PVNet: Self-and-Cross Attention Based Aggregation of Point Cloud and Multi-View for 3D Object Retrieval

    Full text link
    To address 3D object retrieval, substantial efforts have been made to generate highly discriminative descriptors of 3D objects represented by a single modality, e.g., voxels, point clouds or multi-view images. It is promising to leverage the complementary information from multi-modality representations of 3D objects to further improve retrieval performance. However, multi-modality 3D object retrieval is rarely developed and analyzed on large-scale datasets. In this paper, we propose self-and-cross attention based aggregation of point cloud and multi-view images (SCA-PVNet) for 3D object retrieval. With deep features extracted from point clouds and multi-view images, we design two types of feature aggregation modules, namely the In-Modality Aggregation Module (IMAM) and the Cross-Modality Aggregation Module (CMAM), for effective feature fusion. IMAM leverages a self-attention mechanism to aggregate multi-view features while CMAM exploits a cross-attention mechanism to interact point cloud features with multi-view features. The final descriptor of a 3D object for object retrieval can be obtained via concatenating the aggregated features from both modules. Extensive experiments and analysis are conducted on three datasets, ranging from small to large scale, to show the superiority of the proposed SCA-PVNet over the state-of-the-art methods

    Attire detection and retrieval based on region proposals with convolutional neural network

    No full text
    Region Proposals with Convolutional Neural Network Features (RCNN), an object detection algorithm, has a good performance on Visual Object Classes Challenge 2012 [1]. There are two main approaches to improve the performance of it. The first one is to apply high-capacity Convolutional Neutral Network (CNN) with region proposals to localize and segment the object. The other one is to perform supervised pre-training when the labelled data is insufficient. The goal of this project is to build an attire detection system using Region Proposals with Convolutional Neural Network Features. In order to study RCNN, we introduce some concepts related to it. We explain the definitions of object detection, Neural Network (NN) and Convolutional Neural Network (CNN) in detail. The description of RCNN contains two parts. The first part is the method of region proposal, and the second part is the CNN architecture. Then we describe the attire detection system and the process of dataset construction in detail. Finally, we summarize and discuss the testing results. The testing results show RCNN have a good performance on attire object detection. The mean average precision (mAP) based on all categories is 57.26%. Based on the testing results, we find that the quality and amount of training data have a great effect on the performance of attire detection system.Master of Science (Signal Processing

    Missing Data Estimation for Traffic Volume by Searching an Optimum Closed Cut in Urban Networks

    No full text

    Deep residual pooling network for texture recognition

    No full text
    Current deep learning-based texture recognition methods extract spatial orderless features from pre-trained deep learning models that are trained on large-scale image datasets. These methods either produce high dimensional features or have multiple steps like dictionary learning, feature encoding and dimension reduction. In this paper, we propose a novel end-to-end learning framework that not only overcomes these limitations, but also demonstrates faster learning. The proposed framework incorporates a residual pooling layer consisting of a residual encoding module and an aggregation module. The residual encoder preserves the spatial information for improved feature learning and the aggregation module generates orderless feature for classification through a simple averaging. The feature has the lowest dimension among previous deep texture recognition approaches, yet it achieves state-of-the-art performance on benchmark texture recognition datasets such as FMD, DTD, 4D Light and one industry dataset used for metal surface anomaly detection. Additionally, the proposed method obtains comparable results on the MIT-Indoor scene recognition dataset. Our codes are available at https://github.com/maoshangbo/DRP-Texture-Recognition.This work was conducted within the Rolls-Royce@NTU Corporate Lab under the project DACS 2.1: Artificial Intelligence (AI) for Smart Image Understanding with support from the Industry Alignment Fund (IAF) Singapore under the Corp Lab@University Scheme

    Unsupervised feature learning with sparse Bayesian auto-encoding based extreme learning machine

    No full text
    Extreme learning machine (ELM) is a popular method in machine learning with extremely few parameters, fast learning speed and model efficiency. Unsupervised feature learning based ELM receives rising research focus. Recently the ELM auto-encoder (ELM-AE) was proposed for this task, which develops the ELM based compact feature learning without sacrificing elegant solution. Compared with ELM-AE and following â„“1-regularized ELM-AE, we introduce a sparse Bayesian learning scheme into ELM-AE for better generalization capability. A parallel training strategy is also integrated to improve time-efficiency of multi-output sparse Bayesian learning. Furthermore, pruning hidden nodes for better performance and efficiency according to estimated variances of prior distribution of output weights is achieved. Experiments on several datasets verify the effectiveness and efficiency of our proposed ELM-AE for unsupervised feature learning, compared with PCA, NMF, ELM-AE and â„“1-regularized ELM-AE

    Towards Enhanced Recovery and System Stability: Analytical Solutions for Dynamic Incident Effects in Road Networks

    No full text

    R-ELMNet: regularized extreme learning machine network

    No full text
    Principal component analysis network (PCANet), as an unsupervised shallow network, demonstrates noticeable effectiveness on datasets of various volumes. It carries a two-layer convolution with PCA as filter learning method, followed by a block-wise histogram post-processing stage. Following the structure of PCANet, extreme learning machine auto-encoder (ELM-AE) variants are employed to replace the PCA's role, which come from extreme learning machine network (ELMNet) and hierarchical ELMNet. ELMNet emphasizes the importance of orthogonal projection while overlooking non-linearity. The latter introduces complex pre-processing to overcome drawback of non-linear ELM-AE. In this paper, we analyze intrinsic characteristics of ELM-AE variants and accordingly propose a regularized ELM-AE, which combines non-linearity learning capability and approximately orthogonal projection. Experiments on image classification show the effectiveness compared to supervised convolutional neural networks and related shallow networks on unsupervised feature learning
    corecore